Electricity Consumption Analysis

Data Section

The Electricity Consumption and Occupancy (ECO) dataset is a publicly available dataset that contains information about the energy consumption and occupancy of a commercial building over a period of time. The dataset was created to support research on building energy management and sustainability.

The ECO dataset includes data on the electricity consumption of various systems and devices within the building, such as lighting, HVAC, and plug loads. The data is collected at a high frequency, typically every second of the day. The dataset also includes information on the occupancy of the building at different times of the day, which is typically measured using occupancy sensors or manually recorded data.

The ECO dataset is available in several different formats, including raw data files and pre-processed data files. The raw data files contain the original data collected from the building’s meters and sensors, while the pre-processed data files contain aggregated data that has been cleaned and normalized for easier analysis. The pre-processed data files may also include additional information, such as weather data, to help contextualize the energy consumption and occupancy data.

The ECO dataset has been used in a variety of research studies related to building energy management and sustainability. For example, the dataset has been used to develop and test algorithms for energy management systems, to analyze patterns in energy consumption and occupancy, and to evaluate the effectiveness of energy-saving interventions. The dataset is a valuable resource for researchers and practitioners in the field of building energy management and sustainability, as it provides a detailed and comprehensive view of energy consumption and occupancy in a real-world commercial building.

For the purpose of this Homework, I have particularly focused on Household 5. Firstly I look at the overall smart meter data for the household from 27.06.12 to 31.01.13. Then, I dive deeper into the dataset by taking 1 day and see how the energy consumption varies throughout the day of the different appliances in the household.

Data Science Questions

  • What is the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data?

  • What is the energy consumption over time for each appliance in a household, and how does the consumption of each appliance compare to the others?

Code
import pandas as pd
import numpy as np
from pathlib import Path
import altair as alt
import datetime
import plotly.graph_objects as go
import plotly.subplots as sp
from functools import reduce
alt.data_transformers.enable('default', max_rows=None)
DataTransformerRegistry.enable('default')

Data Preparation and EDA for Question 1

Code
# Path of the data files
path = r'/Users/aanchaldusija/Downloads/hw4-spring-2023-aanchal-dusija/eco/05_sm' 

# Get the files from the path provided in the OP
files = Path(path).glob('*.csv')  # .rglob to get subdirectories
Code
# concatenate all the files into one dataframe and add a new column with the filename
df5 = pd.concat((pd.read_csv(f, header=None).assign(filename=f.name) for f in files), ignore_index=True)
Code
# create average of every column based on the filename and show filename column
# this created the average of all the days for each column
daywisedf5 = df5.groupby('filename').mean().reset_index()
Code
# remove .csv from filename and convert to datetime
daywisedf5['filename'] = daywisedf5['filename'].str.replace('.csv', '')
daywisedf5['filename'] = pd.to_datetime(daywisedf5['filename'], format='%Y-%m-%d')
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/3804425420.py:2: FutureWarning:

The default value of regex will change from True to False in a future version.
Code
# change the column names
daywisedf5.columns = ['Date', 'powerallphases', 'powerl1', 'powerl2', 'powerl3', 'currentneutral', 'currentl1', 'currentl2', 'currentl3', 'voltagel1', 'voltagel2', 'voltagel3', 'phaseanglevoltagel2l1', 'phaseanglevoltagel3l1', 'phaseanglecurrentvoltagel1', 'phaseanglecurrentvoltagel2', 'phaseanglecurrentvoltagel3']
Code
# keep only the columns that are needed

daywisedf5 = daywisedf5[['Date', 'powerallphases', 'powerl1', 'powerl2', 'powerl3']]
Code
# Convert the all columns from j/s to kwh

# power
daywisedf5['powerallphases'] = daywisedf5['powerallphases'] * 0.000277778
daywisedf5['powerl1'] = daywisedf5['powerl1'] * 0.000277778
daywisedf5['powerl2'] = daywisedf5['powerl2'] * 0.000277778
daywisedf5['powerl3'] = daywisedf5['powerl3'] * 0.000277778

Snippet of the data

Code
daywisedf5.head()
Date powerallphases powerl1 powerl2 powerl3
0 2012-06-27 0.219594 0.067886 0.052316 0.099392
1 2012-06-28 0.168301 0.048840 0.022499 0.096962
2 2012-06-29 0.179872 0.060406 0.020801 0.098665
3 2012-06-30 0.172266 0.054371 0.022081 0.095814
4 2012-07-01 0.157382 0.042863 0.022073 0.092445
Code
# Data summary
daywisedf5.describe()
powerallphases powerl1 powerl2 powerl3
count 215.000000 215.000000 215.000000 215.000000
mean 0.219348 0.077941 0.043770 0.097632
std 0.064636 0.030438 0.024046 0.022852
min 0.090201 0.015376 0.010201 0.051423
25% 0.172459 0.052027 0.024422 0.082695
50% 0.205051 0.073857 0.038741 0.095814
75% 0.244438 0.099169 0.056128 0.106133
max 0.500702 0.168634 0.129180 0.202888

Correlation Matrix of the data

Code
# Correlation Matrix
corr = daywisedf5.corr()
corr.style.background_gradient(cmap='coolwarm')

# Heatmap
fig = go.Figure(data=go.Heatmap(
                     z=corr.values,
                        x=corr.columns,
                        y=corr.columns,
                        colorscale='Viridis'))

# Add plot title
fig.update_layout(
      title_text="Correlation Matrix",
      title_x=0.5,
      title_font_size=20,
      title_font_color="black",
      title_font_family="Arial",
      title_xanchor="center",
      title_yanchor="top",
      title_y=0.95,
      title_pad={"t": 10, "b": 0},
      width=800,
      height=600,
      margin=dict(l=50, r=50, b=100, t=100, pad=4),
      paper_bgcolor="white",
      plot_bgcolor="white",
      font=dict(
            family="Arial",
            size=12,
            color="black"
      )
)

fig.show(renderer='notebook')
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/2487038129.py:2: FutureWarning:

The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.

From the correlation matrix, we can observe that the power consumption of the different phases are highly correlated. This is expected as the power consumption of the different phases should be similar. The power consumption of the different phases are highly correlated with the total power consumption. This is also expected as the total power consumption is the sum of the power consumption of the different phases. The power consumption of the different phases are also highly correlated with each other. This is expected as the power consumption of the different phases should be similar.

Data Preparation for Question 2

Code
import pandas as pd

dates = ["2012-09-27"]
data_plugs = {}

for date in dates:
    data_list = []
    for i in range(1, 9):
        if i != 3:  # Skipping the third plug as it's missing in the original code
            file_path = f"~/Downloads/hw4-spring-2023-aanchal-dusija/eco/05_plugs/0{i}/{date}.csv"
            data = pd.read_csv(file_path, header=None)
            data_list.append(data)
    
    data_merged = pd.concat(data_list, axis=1)
    data_merged.columns = ["Tablet", "CoffeeMachine", "Microwave", "Fridge", "Entertainment", "PC", "Kettle"]
    
    data_plugs[date] = data_merged
    data_plugs = data_plugs[date]
Code
# Adding time
start_time = pd.to_datetime("12:00:00 AM", format="%I:%M:%S %p")
end_time = pd.to_datetime("11:59:59 PM", format="%I:%M:%S %p")
time_var = pd.date_range(start=start_time, end=end_time, freq='1s').time
time_var = [dt.strftime('%H:%M:%S') for dt in time_var]
Code
# Adding time variable to data_plugs
data_plugs= pd.concat([pd.Series(time_var, name='time_var'), data_plugs_27], axis=1)

Snippet of the data

Code
# snippet of data  
data_plugs.head()
time_var Tablet CoffeeMachine Microwave Fridge Entertainment PC Kettle
0 00:00:00 4.33249 0.0 4.44332 4.44546 8.69303 27.7405 0.0
1 00:00:01 6.45719 0.0 4.44332 4.44546 8.69303 27.7405 0.0
2 00:00:02 6.45719 0.0 6.57853 4.44546 8.69303 25.6201 0.0
3 00:00:03 4.33249 0.0 4.44332 4.44546 6.56679 25.6201 0.0
4 00:00:04 6.45719 0.0 6.57853 4.44546 8.69303 27.7405 0.0
Code
# data summary
data_plugs.describe()
Tablet CoffeeMachine Microwave Fridge Entertainment PC Kettle
count 86400.000000 86400.000000 86400.000000 86400.000000 86400.000000 86400.000000 86400.000000
mean 5.010574 3.183719 5.442396 43.829714 10.142211 32.937567 -0.198049
std 1.041192 63.677580 1.065409 57.864966 7.151379 20.701026 0.399255
min 4.332490 -1.000000 4.443320 4.445460 6.566790 25.620100 -1.000000
25% 4.332490 0.000000 4.443320 4.445460 8.693030 27.740500 0.000000
50% 4.332490 0.000000 4.443320 4.445460 8.693030 27.740500 0.000000
75% 6.457190 0.000000 6.578530 112.800000 8.693030 27.740500 0.000000
max 8.581890 1485.480000 6.578530 1326.570000 44.838300 159.192000 6.580410

Correlation Matrix of the data

Code
# Correlation Matrix
corr = data_plugs.corr()
corr.style.background_gradient(cmap='coolwarm')

# Heatmap
fig = go.Figure(data=go.Heatmap(
                        z=corr.values,
                        x=corr.columns,
                        y=corr.columns,
                        colorscale='Viridis'))
# Add plot title
fig.update_layout(
      title_text="Correlation Matrix",
      title_x=0.5,
      title_font_size=20,
      title_font_color="black",
      title_font_family="Arial",
      title_xanchor="center",
      title_yanchor="top",
      title_y=0.95,
      title_pad={"t": 10, "b": 0},
      width=800,
      height=600,
      margin=dict(l=50, r=50, b=100, t=100, pad=4),
      paper_bgcolor="white",
      plot_bgcolor="white",
      font=dict(
            family="Arial",
            size=12,
            color="black"
      )
)

fig.show(renderer='notebook')
/var/folders/q0/hps99sh511n627gdy32s4wlh0000gn/T/ipykernel_27448/890646115.py:2: FutureWarning:

The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.

From the correlation matrix, we can observe that the power consumption of the different appliances are not correlated. This is expected as the power consumption of the different appliances should not be similar. Every appliance is used at different time periods of the day.

Results

Plot 1 - What is the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data?

The code generates a linked chart to display the average total power consumption and average power consumption for phases 1, 2, and 3 for Household 5 from 27.06.12 to 31.01.13, based on smart meter data.

The chart consists of two parts: a bar chart showing the average total power consumption per day for Household 5, and a grouped bar chart showing the average power consumption per day for each of the three power phases (l1, l2, and l3) for Household 5. The two charts are linked by a brush selection for the date variable, which allows the user to select a specific time range of interest and see the corresponding data in both charts.

The bar chart shows the average total power consumption per day for Household 5, with the x-axis representing the date range from 27.06.12 to 31.01.13 and the y-axis representing the average energy consumption per day in kWh. The color of the bars changes when a date range is selected using the brush selection, and the tooltip displays the date and the corresponding energy consumption.

The grouped bar chart shows the average power consumption per day for each of the three power phases (l1, l2, and l3) for Household 5, with the x-axis representing the same date range and the y-axis representing the average energy consumption per day in kWh. The chart is grouped by power phases and each group is represented by a different color. When a date range is selected using the brush selection, the color of the bars changes accordingly, and the tooltip displays the date and the corresponding energy consumption for each power phase.

Code
# AVERAGE TOTAL POWER AND POWER PHASES 1,2,3 FROM 27.06.12 to 31.01.13 FROM SMART METER DATA

# Create brush selection for the Date variable
brush = alt.selection_single(fields=['Date'], name='brush')

# First bar chart with powerallphases
bar = alt.Chart(daywisedf5).mark_bar().encode(
    x=alt.X('Date:T', axis=alt.Axis(title='Date', format='%b %y', tickCount=alt.TickCount(interval='month', step=1))),
    y=alt.Y('powerallphases:Q', axis=alt.Axis(title='Average Energy Per Day (kwh)')),
    tooltip=['Date', 'powerallphases'],
    color=alt.condition(brush, alt.value('black'), alt.value('black')),
    opacity=alt.condition(brush, alt.value(1), alt.value(0.2))
).interactive().add_selection(
    brush
).properties(
    height=200,
    width=400
)

# Grouped bar chart with powerl1, powerl2, and powerl3
grouped_bar_chart = alt.Chart(daywisedf5).mark_bar().encode(
    x=alt.X('Date:T', axis=alt.Axis(title='Date', format='%b %y', tickCount=alt.TickCount(interval='month', step=1))),
    y=alt.Y('value:Q', axis=alt.Axis(title='Average Energy Per Day (kwh)')),
    color='variable:N',
    column=alt.Column('variable:N', title='Power Phases'),
    tooltip=['Date', 'value:Q'],
    opacity=alt.condition(brush, alt.value(1), alt.value(0.2))
).transform_fold(
    ['powerl1', 'powerl2', 'powerl3'],
    as_=['variable', 'value']
).interactive().add_selection(
    brush
)


# make plot smaller for linked_charts
linked_charts = alt.vconcat(
    bar, grouped_bar_chart,
    center=True
).configure_title(
    fontSize=18,
    anchor='middle'
).properties(
    title=alt.TitleParams(text='HOUSEHOLD 5: AVERAGE ENERGY CONSUMPTION FROM 27.06.12 to 31.01.13 SMART METER DATA', anchor='middle', offset=20)
).configure_view(
    height=100,
    width=250
)
linked_charts

Rationale for design decisions:

The graoh visualises

Visual encodings: The choice of bar charts for visualizing average power consumption was made to provide an easy-to-understand and clear representation of the data. Bar charts are effective in showcasing differences in values across categories or over time. In this case, the height of the bars represents the average power consumption, making it simple to compare values across dates or between power phases.

Interaction: A brush selection was incorporated into the design to enable users to easily explore and focus on specific time periods. By selecting a range on the bar chart, the corresponding data points in the grouped bar chart will be highlighted, allowing users to examine the power consumption for phases 1, 2, and 3 in more detail. This interactive feature improves the user experience and helps users gain a better understanding of the data.

Animation: The opacity of the bars changes upon brush selection, emphasizing the chosen data points and fading out the others. This visual cue guides the user’s attention to the selected data points and provides a smooth transition between different time periods.

Alternative considerations:

Line chart: A line chart could have been used instead of a bar chart to represent the average power consumption over time. Line charts are useful for showing trends over time. However, bar charts were chosen for their simplicity and ease of comparison between individual data points.

Stacked bar chart: A stacked bar chart could have been used to display the power consumption for phases 1, 2, and 3 in a single chart. However, a grouped bar chart was chosen to make it easier for users to compare the power consumption across phases, as the individual bars representing each phase are clearly visible and not overlapped.

The ultimate choices were made based on the goal of providing an intuitive, easy-to-understand, and interactive visualization that allows users to explore and compare the average power consumption for each phase in a household. The selected visual encodings, interaction techniques, and animations work together to achieve this goal effectively.

Plot 2 - What is the energy consumption over time for each appliance in a household, and how does the consumption of each appliance compare to the others?

The code generates a graph showing the energy consumption over time for each appliance in Household 5 on 27.06.12, as well as a dropdown menu allowing the user to compare the energy consumption of each appliance to the others.

The graph is created using the Plotly library in Python. Each appliance in the household is represented as a line, with the x-axis representing time and the y-axis representing energy consumption in watts. The graph shows how the energy consumption of each appliance varies over time, allowing the user to identify patterns or trends in energy usage.

The dropdown menu allows the user to select which appliances to display on the graph. The default setting is “All,” which displays all of the appliances in the household. However, the user can also select individual appliances to compare their energy consumption to the others.

The title of the graph is “Energy Consumption vs Time for Household 5 on 27.06.12,” and the x-axis is labeled “Time” while the y-axis is labeled “Consumption (Watts).” The dropdown menu is included below the graph and allows the user to easily compare the energy consumption of each appliance to the others.

Code
# Energy Consumption vs Time for Household 5 for 27.06.12 
fig = go.Figure()

# Add a trace for each column
for col in data_plugs.columns[1:]:
    fig.add_trace(go.Scatter(x=data_plugs['time_var'], y=data_plugs[col], mode='lines', name=col, visible=True))

# Create the list of buttons for the dropdown menu
buttons = [dict(label='All',
                method='update',
                args=[{'visible': [True for col in data_plugs.columns[1:]]}])]

for col in data_plugs.columns[1:]:
    buttons.append(dict(label=col,
                        method='update',
                        args=[{'visible': [True if trace_name == col else False for trace_name in data_plugs.columns[1:]]}]))

# Add the updatemenu with the dropdown options to the layout
fig.update_layout(
    title="Energy Consumption vs Time for Household 5 on 27.06.12",
    xaxis_title="Time",
    yaxis_title="Consumption (Watts))",
    updatemenus=[dict(active=0, buttons=buttons)]
)

fig.show(renderer='notebook')

Rationale for design decisions:

Visual encodings: The choice of a line chart for visualizing energy consumption over time was made to effectively represent the continuous nature of the data and to clearly show trends and fluctuations in consumption for each appliance. Line charts are particularly suitable for time series data as they connect data points, making it easier to identify patterns and trends.

Interaction: A dropdown menu was implemented to provide users with the option to select specific appliances or to view all appliances at once. This interaction technique allows users to focus on individual appliances while still being able to compare them to the others. By simplifying the selection process, users can efficiently explore the data and compare the energy consumption of different appliances.

Animation: The dropdown menu provides a smooth and straightforward way to toggle between the visibility of different appliances in the chart. As users select different options, the chart updates seamlessly, creating a responsive user experience.

Alternative considerations:

Stacked area chart: A stacked area chart could have been used to represent the energy consumption of all appliances in a single chart, showing the cumulative energy consumption over time. However, this approach may make it harder to compare individual appliances’ consumption as the areas may overlap or be obscured by others. The line chart with the dropdown menu was chosen to maintain clarity in the visualization.

Small multiples: Small multiples could have been used to display individual line charts for each appliance in a grid, allowing users to compare energy consumption across appliances visually. However, the chosen design with the dropdown menu provides a more compact and interactive way to explore the data while still offering the ability to compare different appliances.

The ultimate choices were made based on the goal of providing an intuitive, easy-to-understand, and interactive visualization that allows users to explore and compare the energy consumption of each appliance in a household. The selected visual encodings, interaction techniques, and animations work together to achieve this goal effectively.

References

  1. https://sites.google.com/view/activities-prediction-202b/implementation-details/electricity-data-visualization
  2. https://towardsdatascience.com/making-interactive-line-plots-with-python-pandas-and-altair-7ee1d109e3dd
  3. http://www.vs.inf.ethz.ch/publ/papers/kleiminger_ubicomp2015.pdf
  4. http://www.vs.inf.ethz.ch/publ/papers/beckel-2014-nilm.pdf
  5. https://stackoverflow.com/questions/39604271/conda-environments-not-showing-up-in-jupyter-notebook
  6. https://marckvaisman.georgetown.domains/anly503/readings/Altair-joss.pdf
  7. https://marckvaisman.georgetown.domains/anly503/readings/plotly-py.pdf